Vehicle Detection

I build an SVM classifier to distinguish vehicles from non-vehicles.

Outline:

  1. Collect summary statistics on data.
  2. Explore and define color features.
  3. Define Histogram of Oriented Gradient features.
  4. Extract and normalize features.
  5. Build and train an SVM classifier.
  6. Search for vehicle in an image using a sliding window search.
  7. Combine overlapping windows & eliminate false positives.
  8. Generate final video output.

1. Collect summary statistics on data.

In [1]:
# Adapted from "Project: Vehicle Detection and Tracking, 19. Data Exploration".

import matplotlib.pyplot as plt
import glob

# Images are divided up into vehicles and non-vehicles
cars = glob.glob('./vehicles/*/*.png')
notcars = glob.glob('./non-vehicles/*/*.png')
        
# Define a function to return some characteristics of the dataset 
def data_look(car_list, notcar_list):
    data_dict = {}
    # Define a key in data_dict "n_cars" and store the number of car images
    data_dict["n_cars"] = len(car_list)
    # Define a key "n_notcars" and store the number of notcar images
    data_dict["n_notcars"] = len(notcar_list)
    # Read in a test image, either car or notcar
    img = plt.imread(car_list[0])
    # Define a key "image_shape" and store the test image shape 3-tuple
    data_dict["image_shape"] = img.shape
    # Define a key "data_type" and store the data type of the test image.
    data_dict["data_type"] = type(img[0, 0, 0])
    # Return data_dict
    return data_dict
    
data_info = data_look(cars, notcars)

print('The data has a count of', 
      data_info["n_cars"], 'cars and', 
      data_info["n_notcars"], 'non-cars')
print('of size:', data_info["image_shape"],
      'and data type:', data_info["data_type"])
The data has a count of 8792 cars and 8968 non-cars
of size: (64, 64, 3) and data type: <class 'numpy.float32'>

The dataset does not suffer from a significant class imbalance.

2. Explore and define color features.

In [2]:
# Adapted from "Project: Vehicle Detection and Tracking, 15. Explore Color Spaces".

import cv2
import numpy as np
from mpl_toolkits.mplot3d import Axes3D

def plot3d(pixels, colors_rgb, axis_labels=list("RGB"), axis_limits=((0, 255), (0, 255), (0, 255))):
    """Plot pixels in 3D."""

    # Create figure and 3D axes
    fig = plt.figure(figsize=(8, 8))
    ax = Axes3D(fig)

    # Set axis limits
    ax.set_xlim(*axis_limits[0])
    ax.set_ylim(*axis_limits[1])
    ax.set_zlim(*axis_limits[2])

    # Set axis labels and sizes
    ax.tick_params(axis='both', which='major', labelsize=14, pad=8)
    ax.set_xlabel(axis_labels[0], fontsize=16, labelpad=16)
    ax.set_ylabel(axis_labels[1], fontsize=16, labelpad=16)
    ax.set_zlabel(axis_labels[2], fontsize=16, labelpad=16)

    # Plot pixel values with colors given in colors_rgb
    ax.scatter(
        pixels[:, :, 0].ravel(),
        pixels[:, :, 1].ravel(),
        pixels[:, :, 2].ravel(),
        c=colors_rgb.reshape((-1, 3)),
        edgecolors='none')

    return ax  # return Axes3D object for further manipulation

def plot_colors(img_path, img_figsize=(15, 15)):
    """Plots img_path and a sample of its pixels in various color spaces."""
    # Read a color image
    img = cv2.imread(img_path)
    plt.figure(figsize=img_figsize)
    plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

    # Select a small fraction of pixels to plot by subsampling it
    scale = max(img.shape[0], img.shape[1], 64) / 64  # at most 64 rows and columns
    img_small = cv2.resize(img, (np.int(img.shape[1] / scale), np.int(img.shape[0] / scale)),
                           interpolation=cv2.INTER_NEAREST)

    # Convert subsampled image to desired color space(s)
    img_small_RGB = cv2.cvtColor(img_small, cv2.COLOR_BGR2RGB)  # OpenCV uses BGR, matplotlib likes RGB
    img_small_rgb = img_small_RGB / 255.  # scaled to [0, 1], only for plotting
    img_small_HLS = cv2.cvtColor(img_small, cv2.COLOR_BGR2HLS)
    img_small_HSV = cv2.cvtColor(img_small, cv2.COLOR_RGB2HSV)
    img_small_LUV = cv2.cvtColor(img_small, cv2.COLOR_BGR2Luv)
    img_small_YUV = cv2.cvtColor(img_small, cv2.COLOR_BGR2YUV)
    img_small_YCrCb = cv2.cvtColor(img_small, cv2.COLOR_BGR2YCrCb)

    # Plot and show data.
    plot3d(img_small_RGB, img_small_rgb)
    plt.show()
    plot3d(img_small_HLS, img_small_rgb, axis_labels=list("HLS"), axis_limits=((0, 179), (0, 255), (0, 255)))
    plt.show()
    plot3d(img_small_HSV, img_small_rgb, axis_labels=list("HSV"), axis_limits=((0, 179), (0, 255), (0, 255)))
    plt.show()
    plot3d(img_small_LUV, img_small_rgb, axis_labels=list("LUV"))
    plt.show()
    plot3d(img_small_YUV, img_small_rgb, axis_labels=list("YUV"))
    plt.show()
    plot3d(img_small_YCrCb, img_small_rgb, axis_labels=list("YCrCb"))
    plt.show()
           
plot_colors('./exploration/000275.png')

The image has black, red, and white cars. HLS color space seems to do the best job emphasizing the separation of these points from the rest of the image. HSV color space also performs well, but since HSV and HLS encode similar information in different orders, I'd like to pick one set of color features. Including both would be redundant.

In [3]:
plot_colors('./exploration/000528.png')

This image has some white (?) cars in the distance and a large black car in the lower right corner. The HLS and HSV color spaces do a good job emphasizing the black car, but HLS does a better job keeping the collection of black points in a single cluster.

In [4]:
plot_colors('./exploration/001240.png')

This image has some two nearby white cars, a distant white car, a black car, and a red car. The red car isn't very distinct in any of the color space 3D plots, but HLS color space still seems to do the best job separating out the white and black cars.

In [5]:
plot_colors('./exploration/yellow_car.png', img_figsize=(2, 2))

HSL color space separates the yellow points from the black points the best (diagonally), and also does a good job separating the yellow car and pale blue background points.

In [6]:
plot_colors('./exploration/white_car.png', img_figsize=(2, 2))

HLS again separates the white car from the black points the best, but has the defect of treating its red backup lights as separate objects from the car.

In [7]:
plot_colors('./exploration/red_car.png', img_figsize=(2, 2))

On this red car, HSV color space does a better job keeping the car in a single cluster than HSL.

In [8]:
plot_colors('./exploration/road.png', img_figsize=(2, 2))

The image is of a patch of road, which is pretty uniform in color and show up in all of the plots as a single cluster.

In [9]:
plot_colors('./exploration/sky.png', img_figsize=(2, 2))

This image is of a patch of sky and part of a tree. HSV and HSL color space separate out these two components. Otherwise, the points in the plots are tightly clustered, unlike in the *_car.png images.

In [10]:
plot_colors('./exploration/building.png', img_figsize=(2, 2))

In HLS and HSV color space, this image has blue and red streaks similar to the *_car.png images, although less pronounced. A case like this might give rise to a false positive classification.

Overall, HLS color space seems best suited for distinguishing cars from other objects in a scene.

I next define a bin_spatial() function to bin the pixels in an image into size buckets, get the color channel information, and flatten the results into a feature vector.

In [11]:
# Adapted from "Project: Vehicle Detection and Tracking, 16. Spatial Binning of Color".

SPATIAL = 16

def convert_color(img, color_space='HLS'):
    """Convert color from RGB color space to color_space color space."""
    # Apply color conversion if other than 'RGB'
    if color_space == 'RGB':
        feature_img = np.copy(img)
    elif color_space == 'HLS':
        feature_img = cv2.cvtColor(img, cv2.COLOR_RGB2HLS)
    elif color_space == 'HSV':
        feature_img = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
    elif color_space == 'LUV':
        feature_img = cv2.cvtColor(img, cv2.COLOR_RGB2Luv)
    elif color_space == 'YUV':
        feature_img = cv2.cvtColor(img, cv2.COLOR_RGB2YUV)
    elif color_space == 'YCrCb':
        feature_img = cv2.cvtColor(img, cv2.COLOR_RGB2YCrCb)
    return feature_img

# Define a function to compute color histogram features  
# Pass the color_space flag as 3-letter all caps string
# like 'HSV' or 'LUV' etc.
# KEEP IN MIND IF YOU DECIDE TO USE THIS FUNCTION LATER
# IN YOUR PROJECT THAT IF YOU READ THE IMAGE WITH 
# cv2.imread() INSTEAD YOU START WITH BGR COLOR!
def bin_spatial(img, color_space='HLS', size=(10, 10)):
    feature_img = convert_color(img, color_space=color_space)
    # Use cv2.resize().ravel() to create the feature vector
    features = cv2.resize(feature_img, size).ravel()
    # Return the feature vector
    return features

# View a random car's spatial bins.
plt.figure(figsize=(15, 15))
ind = np.random.randint(0, len(cars))
img = plt.imread(cars[ind])
plt.subplot(121).set_title("Original")
plt.imshow(img)
feat = bin_spatial(img, size=(SPATIAL, SPATIAL))
plt.subplot(122).set_title("Resized")
# Convert features back into RGB color space.
plt.imshow(cv2.cvtColor(feat.reshape((SPATIAL, SPATIAL, 3)), cv2.COLOR_HLS2RGB))
plt.show()

A resolution of about 10x10 is as about as low as we can get while retaining an image something like a car. To divide evenly into 64x64, let's go with 16x16.

Our 3D plots showed that the L and S channels are most relevant for picking out cars, so let's include just these two channels.

In [12]:
# Bins lightness and saturation information in a `size` matrix, which is then
# flattened into a feature vector.
def bin_spatial_ls(img, size=(16, 16)):
    feature_img = convert_color(img, color_space='HLS')
    # Use cv2.resize().ravel() to create the feature vector
    features = cv2.resize(feature_img, size)[:, :, (1, 2)].ravel()
    # Return the feature vector
    return features

3. Define Histogram of Oriented Gradient features.

In [13]:
# Adapted from "Project: Vehicle Detection and Tracking, 20. scikit-image HOG".

from skimage.feature import hog

ORIENT = 9
PIX_PER_CELL = 8
CELL_PER_BLOCK = 2

# Define a function to return HOG features and visualization
def get_hog_features_one_channel(img, orient, pix_per_cell, cell_per_block,
                                 vis=False, feature_vec=True, transform_sqrt=True):
    if vis == True:
        # Use skimage.hog() to get both features and a visualization
        return hog(img,
                   orientations=orient,
                   pixels_per_cell=(pix_per_cell, pix_per_cell), 
                   cells_per_block=(cell_per_block, cell_per_block), 
                   visualise=True,
                   feature_vector=feature_vec,
                   block_norm="L2-Hys",
                   transform_sqrt=transform_sqrt)
    else:      
        # Use skimage.hog() to get features only
        return hog(img,
                   orientations=orient,
                   pixels_per_cell=(pix_per_cell, pix_per_cell), 
                   cells_per_block=(cell_per_block, cell_per_block), 
                   visualise=False,
                   feature_vector=feature_vec,
                   block_norm="L2-Hys",
                   transform_sqrt=transform_sqrt)

# View a random car's HOG visualization.
ind = np.random.randint(0, len(cars))
img = plt.imread(cars[ind])
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)

# Get HOG features for each color channel.
_, gray_hog_img = get_hog_features_one_channel(gray, ORIENT, PIX_PER_CELL, CELL_PER_BLOCK,
                                               vis=True, feature_vec=False)

# Display the original image and each color channel's HOG visualization.
plt.figure(figsize=(15, 15))
plt.subplot(1, 3, 1).set_title('Example Car Image')
plt.imshow(img)
plt.subplot(1, 3, 2).set_title('Grayscale Car Image')
plt.imshow(gray, cmap='gray')
plt.subplot(1, 3, 3).set_title('HOG Visualization')
plt.imshow(gray_hog_img, cmap='gray')
plt.show()

To minimize the number of features, I try and push the pixels per cell value as high as possible. A value of 12 is about as high as can be used without beginning to impact resolution. To divide evenly into 64, let's use 8.

I only care about shape information, so to minimize the number of features while increasing/maintaining contrast, I grayscale the image to reduce it to a single channel.

4. Extract and normalize features.

In [14]:
# Adapted from "Project: Vehicle Detection and Tracking, 22. Combine and Normalize Features" & "29. HOG Classify."

import matplotlib.image as mpimg
from sklearn.preprocessing import StandardScaler

# Extracts spatial bin and HOG features from one image.
def extract_image_features(img, orient, pix_per_cell, cell_per_block, spatial_size=(16, 16)):    
    # Get spatial color features.
    spatial_features = bin_spatial_ls(img, size=spatial_size)
    # Get HOG features.
    gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
    hog_features = get_hog_features_one_channel(gray, orient, pix_per_cell, cell_per_block,
                                    transform_sqrt=True)
    # Return complete feature vector.
    return np.concatenate((spatial_features, hog_features))

# Extracts spatial bin and HOG featuers from a list of images.
def extract_features(img_path_list, orient, pix_per_cell, cell_per_block, spatial_size=(16, 16)):
    # Create a list to append feature vectors to
    features = []
    # Iterate through the list of images
    for path in img_path_list:
        # Read in each one by one
        img = plt.imread(path)
        # Add features to feature vector list.
        features.append(
            extract_image_features(img, orient, pix_per_cell, cell_per_block, spatial_size=spatial_size))
    # Return list of feature vectors
    return features


car_features = extract_features(cars, ORIENT, PIX_PER_CELL, CELL_PER_BLOCK, spatial_size=(SPATIAL, SPATIAL))
notcar_features = extract_features(notcars, ORIENT, PIX_PER_CELL, CELL_PER_BLOCK, spatial_size=(SPATIAL, SPATIAL))

# Create an array stack of feature vectors
X = np.vstack((car_features, notcar_features)).astype(np.float64)                        
# Fit a per-column scaler.
X_scaler = StandardScaler().fit(X)
# Apply the scaler to X.
scaled_X = X_scaler.transform(X)

# Define the labels vector
y = np.hstack((np.ones(len(car_features)), np.zeros(len(notcar_features))))

# Plot an example of raw and scaled features
ind = np.random.randint(0, len(cars))
fig = plt.figure(figsize=(12,4))
plt.subplot(131)
plt.imshow(mpimg.imread(cars[ind]))
plt.title('Original Image')
plt.subplot(132)
plt.plot(X[ind])
plt.title('Raw Features')
plt.subplot(133)
plt.plot(scaled_X[ind])
plt.title('Normalized Features')
fig.tight_layout()
plt.show()

5. Build and train an SVM classifier.

In [18]:
# Adapted from "Project: Vehicle Tracking, 28. Color Classify".

import time
from sklearn.svm import SVC
from sklearn.cross_validation import train_test_split
from sklearn.grid_search import GridSearchCV


# Split up data into randomized training and test sets
rand_state = np.random.randint(0, 100)
X_train, X_test, y_train, y_test = train_test_split(scaled_X, y, test_size=0.2, random_state=rand_state)

print('Using spatial binning of:', (SPATIAL, SPATIAL))
print('and HOG features with', ORIENT, 'orientation bins,', PIX_PER_CELL, 'pixels per cell,')
print('and', CELL_PER_BLOCK, 'cells per block')
print('Feature vector length:', len(X_train[0]))

# Set up grid search.
parameters = {'kernel': ['linear', 'rbf'], 'C': [5, 10, 15]}
svr = SVC(probability=True)
clf = GridSearchCV(svr, parameters)

# Check the training time for the SVC
t = time.time()
clf.fit(X_train, y_train)
t2 = time.time()
print(round(t2 - t, 2), 'Seconds to train SVC...')

# Check the score of the SVC
print('Test Accuracy of SVC = ', round(clf.score(X_test, y_test), 4))

# Report best parameter values (to speed up future training).
print(clf.best_params_, 'are the best parameter values.')
Using spatial binning of: (16, 16)
and HOG features with 9 orientation bins, 8 pixels per cell,
and 2 cells per block
Feature vector length: 2276
4283.6 Seconds to train SVC...
Test Accuracy of SVC =  0.9825
{'C': 15, 'kernel': 'rbf'} are the best parameter values.

6. Search for vehicle in an image using a sliding window search.

In [19]:
# Adapted from "Project: Vehicle Tracking, 34. Search and Classify" & "35. Hog Sub-sampling Window Search".

PROB_THRESH = 0.99

# Define a function to draw bounding boxes.
# `bboxes` is a list of tuples of bounding box opposing corners.
def draw_boxes(img, bboxes, color=(0, 0, 255), thick=6):
    # Make a copy of the image
    imcopy = np.copy(img)
    # Iterate through the bounding boxes
    for bbox in bboxes:
        # Draw a rectangle given bbox coordinates
        cv2.rectangle(imcopy, bbox[0], bbox[1], color, thick)
    # Return the image copy with boxes drawn
    return imcopy

# Define a single function that can extract features using HOG sub-sampling
# and also make predictions about where cars are located in an image.
#
# The function works as follows:
#   1) HOG features are computed once for the entire search region of the image.
#   2) Then sliding windows of different magnifications are applied over the search region.
#      HOG features for each window are found by sub-selecting from the complete set of HOG features.
#      Color bin features are also computed for the window.
#   3) The features for the window are normalized and passed to a classifier to predict whether,
#      with high probability, the patch of image within the window contains a car or not.
#   4) If a patch is predicted to contain a car, that window is recorded.
#   5) The list of all windows predicted to contain cars is returned.
#
# `ystart` and `ystop` define a y-axis range to search.
# `scale` is the magnification to apply to the image prior to searching.
# `clf` is a car/not-car classifier.
# `X_scaler` is a feature normalizer.
# `orient` is the number of orientation bins for HOG features.
# `pix_per_cell` is the number of pixels per cell for HOG features.
# `cell_per_block` is the number of cells per block for HOG features.
# `spatial_size` is the dimensions of the spatial color bins.
def find_cars(img, ystart, ystop, scale, clf, X_scaler, orient, pix_per_cell, cell_per_block, spatial_size):

    # Define a color-transformed image and the region over which to run the sliding window search.
    ctrans_img = convert_color(img, color_space='RGB')
    ctrans_tosearch = ctrans_img[ystart:ystop,:,:]
    # Potentially magnify the image.
    if scale != 1:
        imshape = ctrans_tosearch.shape
        ctrans_tosearch = cv2.resize(ctrans_tosearch, (np.int(imshape[1]/scale), np.int(imshape[0]/scale)))
    
    # HOG features will use the grayscaled search region.
    # Spatial bin features will use the original color-transformed search region.
    gray = cv2.cvtColor(ctrans_tosearch, cv2.COLOR_RGB2GRAY)

    # Define blocks and steps.
    nxblocks = (gray.shape[1] // pix_per_cell) - cell_per_block + 1
    nyblocks = (gray.shape[0] // pix_per_cell) - cell_per_block + 1 
    window = 64  # 64 was the orginal sampling rate, with 8 cells and 8 pix per cell
    nblocks_per_window = (window // pix_per_cell) - cell_per_block + 1
    cells_per_step = 1  # Instead of overlap, define how many cells to step
    nxsteps = (nxblocks - nblocks_per_window) // cells_per_step + 1
    nysteps = (nyblocks - nblocks_per_window) // cells_per_step + 1
    
    # Compute individual channel HOG features for the entire image
    hog = get_hog_features_one_channel(gray, orient, pix_per_cell, cell_per_block, feature_vec=False)
    
    bboxes = []
    for xb in range(nxsteps):
        for yb in range(nysteps):
            ypos = yb * cells_per_step
            xpos = xb * cells_per_step
            
            # Extract HOG for this patch.
            hog_features = hog[ypos:ypos + nblocks_per_window, xpos:xpos + nblocks_per_window].ravel() 

            xleft = xpos * pix_per_cell
            ytop = ypos * pix_per_cell

            # Extract the image patch.
            subimg = cv2.resize(ctrans_tosearch[ytop:ytop + window, xleft:xleft + window, :], (64, 64))
            # Get color features for the image patch.
            spatial_features = bin_spatial_ls(subimg, size=spatial_size)

            # Scale features and make a prediction
            test_features = X_scaler.transform(np.hstack((spatial_features, hog_features)).reshape(1, -1))
            test_prediction = clf.predict_proba(test_features)[0, 1]
            
            # If a car was detected, record its bounding box.
            if test_prediction > PROB_THRESH:
                xbox_left = np.int(xleft * scale)
                ytop_draw = np.int(ytop * scale)
                win_draw = np.int(window * scale)
                bboxes.append(((xbox_left, ytop_draw + ystart),
                              (xbox_left + win_draw, ytop_draw + win_draw + ystart)))
                
    return bboxes
In [20]:
YSTART = 380  # No cars in the trees; Udacity students haven't developed a flying car company yet!
YSTOP = 700  # Should be YSTART + a multiple of `window` (defined in `find_cars()` above)
SCALES = [1, 2., 3.]

# Plot vehicle detections for all test images.
plt.figure(figsize=(15, 30))
images = glob.glob('./test_images/*.jpg')
for i, path in enumerate(images):
    orig_img = plt.imread(path)
    img = orig_img.astype(np.float32)/255  # Required for JPEGs.

    # Search for vehicles at all scales.
    bboxes = []
    for scale in SCALES:
        bboxes.extend(find_cars(img, YSTART, YSTOP, scale, clf, X_scaler, ORIENT, PIX_PER_CELL, CELL_PER_BLOCK,
                      (SPATIAL, SPATIAL)))

    plt.subplot(len(images), 2, 2 * i + 1).set_title('Original ' + path)
    plt.imshow(img)
    plt.subplot(len(images), 2, 2 * i + 2).set_title('Car Positions')
    out_img = draw_boxes(orig_img, bboxes, color=(0, 0, 255), thick=6)
    plt.imshow(out_img)
plt.show()
<matplotlib.figure.Figure at 0x7f86f053d2b0>

Woohoo! There are very few false positive identifications.

Depending on the run, the black car may or may not be identified, as it enters the shadow cast by the tree (./test_images/test5.jpg and ./test_images/test6.jpg). Hopefully collecting bounding boxes over multiple frames will make it possible to bridge the gap with this car.

7. Combine overlapping windows & eliminate false positives.

In [21]:
# Adapted from "Project: Vehicle Detection, 37. Multiple Detections & False Positives".

from scipy.ndimage.measurements import label

# Number of overlapping bounding boxes required to identify a labeled object as a car.
HEAT_THRESH = 1  # This will be redefined later, when considering consecutive video frames.

# Adds heat to a heatmap.
def add_heat(heatmap, bbox_list):
    # Iterate through list of bboxes
    for box in bbox_list:
        # Add += 1 for all pixels inside each bbox
        # Assuming each "box" takes the form ((x1, y1), (x2, y2))
        heatmap[box[0][1]:box[1][1], box[0][0]:box[1][0]] += 1

    # Return updated heatmap
    return heatmap# Iterate through list of bboxes

# Applies a threshold to a heat map by zeroing out pixels below the threshold.
def apply_threshold(heatmap, threshold):
    # Zero out pixels below the threshold
    heatmap[heatmap <= threshold] = 0
    # Return thresholded map
    return heatmap

# Draw a single bounding box around each vehicle identified in `labels`.
# This is done by finding, for each vehicle, all pixels associated with that
# vehicle and drawing a bounding box around all its pixels.
#
# `labels` is the output of the `label()` function, which is a tuple of
# labeled pixels and number of labels.
#   * The labels identify each detected vehicle
#   * The labeled pixels have the same shape as `img`, but each pixel is assigned
#     a detected vehicle label.
def draw_labeled_bboxes(img, labels):
    # Iterate through all detected cars
    for car_number in range(1, labels[1] + 1):
        # Find pixels with each car_number label value.
        nonzero = (labels[0] == car_number).nonzero()
        # Identify x and y values of those pixels.
        nonzeroy = np.array(nonzero[0])
        nonzerox = np.array(nonzero[1])
        # Define a bounding box based on min/max x and y.
        bbox = ((np.min(nonzerox), np.min(nonzeroy)), (np.max(nonzerox), np.max(nonzeroy)))
        # Draw the box on the image.
        cv2.rectangle(img, bbox[0], bbox[1], (0, 0, 255), 6)
    # Return the image.
    return img

# Draw a single bounding box around each detected vehicle in `image`,
# given a list of all windows identified as containing a vehicle.
def draw_bounding_boxes(image, bboxes_list):
    heat = np.zeros_like(image[:,:,0]).astype(np.float)

    # Add heat to each box in box list
    heat = add_heat(heat, bboxes_list)
    
    # Apply threshold to help remove false positives
    heat = apply_threshold(heat, HEAT_THRESH)

    # Visualize the heatmap when displaying    
    heatmap = np.clip(heat, 0, 255)

    # Find final boxes from heatmap using label function
    labels = label(heatmap)
    draw_img = draw_labeled_bboxes(np.copy(image), labels)
    return heatmap, draw_img

# Plot a single vehicle detection bounding box per detected vehicle and
# the pre-thresholding heat maps for all test images.
plt.figure(figsize=(15, 20))
images = glob.glob('./test_images/*.jpg')
for i, path in enumerate(images):
    orig_img = plt.imread(path)
    img = orig_img.astype(np.float32)/255  # Required for JPEGs.

    # Search for vehicles at all scales.
    bboxes = []
    for scale in SCALES:
        bboxes.extend(find_cars(img, YSTART, YSTOP, scale, clf, X_scaler, ORIENT, PIX_PER_CELL, CELL_PER_BLOCK,
                      (SPATIAL, SPATIAL)))

    plt.subplot(len(images), 3, 3 * i + 1).set_title('Original ' + path)
    plt.imshow(img)
    heatmap, out_img = draw_bounding_boxes(orig_img, bboxes)
    plt.subplot(len(images), 3, 3 * i + 2).set_title('Car Positions')
    plt.imshow(out_img)
    plt.subplot(len(images), 3, 3 * i + 3).set_title('Heat Map')
    plt.imshow(heatmap)
plt.show()

Applying thresholding to the heat map of overlapping bounding boxes is able to eliminate the false positive and draw tight boxes around the cars in most cases (./test_images/test5.jpg as the while car recedes out of the image is an exception).

8. Generate final video output.

In [22]:
# Adapted from CarND-LaneLines-P1/P1.ipynb".

from moviepy.editor import VideoFileClip
from IPython.display import HTML

HEAT_THRESH = 10
FRAME_WINDOW = 20  # The number of frames to look back.
prev_bboxes = []  # Vehicle detections for all frames.

def process_image(get_frame, t):
    # NOTE: The output you return should be a color image (3 channel) for processing video below
    orig_img = get_frame(t)
    img = orig_img.astype(np.float32)/255  # Model was trained on PNGs, frames are JPEGs.
 
    # Search for vehicles at all scales.
    bboxes = []
    for scale in SCALES:
        bboxes.extend(find_cars(img, YSTART, YSTOP, scale, clf, X_scaler, ORIENT, PIX_PER_CELL, CELL_PER_BLOCK,
                                (SPATIAL, SPATIAL)))
    # Record raw vehicle detections.
    prev_bboxes.append(bboxes)
    
    # Prior to drawing tight bounding boxes, concatenate the lists of bounding boxes
    # found in the last FRAME_WINDOW frames.
    flat = [bbox for bboxes in prev_bboxes[-FRAME_WINDOW:] for bbox in bboxes]
    _, draw_img = draw_bounding_boxes(orig_img, flat)
    return draw_img

output = './output_images/project_video.mp4'
## To speed up the testing process you may want to try your pipeline on a shorter subclip of the video
## To do so add .subclip(start_second,end_second) to the end of the line below
clip = VideoFileClip('./project_video.mp4')
output_clip = clip.fl(process_image)
%time output_clip.write_videofile(output, audio=False)
[MoviePy] >>>> Building video ./output_images/black_car_video.mp4
[MoviePy] Writing video ./output_images/black_car_video.mp4
100%|█████████▉| 1260/1261 [11:00:49<00:32, 32.02s/it]  
[MoviePy] Done.
[MoviePy] >>>> Video ready: ./output_images/black_car_video.mp4 

CPU times: user 11h 1min 54s, sys: 4.68 s, total: 11h 1min 59s
Wall time: 11h 49s
In [25]:
HTML("""
<video width="960" height="540" controls>
  <source src="{0}">
</video>
""".format(output))
Out[25]: